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, ^, Abstract 

In the paradigm of computer experiments, the choice of an experimental design is 
^ an important issue. When no information is available about the black-box function to 

be approximated, an exploratory design has to be used. In this context, two dispersion 
00 criteria are usually considered: the MINIMAX and the maximin ones. In the case of a 

hypercube domain, a stanclarcl strategy consists of taking the MAXIMIN design within 
the class of Latin hypercube designs. However, in a non hypercube context, it does 
not make sense to use the Latin hypercube strategy. Moreover, whatever the design is, 
the black-box function is typically approximated thanks to kernel interpolation. Here, 
we first provide a theoretical justification to the MAXIMIN criterion with respect to 
!• kernel interpolations. Then, we propose simulated annealing algorithms to determine 

. !^ MAXIMIN designs in any bounded connected domain. We prove the convergence of the 

different schemes. Finally, the methodology is applied on a challenging real example 
H where the black-blox function describes the behaviour of an aircraft engine. 
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1 Introduction 



A function / : — )• M is said to be an expensive black-box function if / is only known 
through a time consuming code. It is assumed that E is enclosed in a known bounded 
set of M'^. E is not necessarily a hypercube domain or explicit. E may be given through 
an indicator function only. It is assumed that testing if a point belongs to E is not 
burdensome. Hence simulate a point in E can be performed by sampling rejection. In 
order to deal with some concerns such as pre-visualization, prediction, optimization and 
probabilistic analysis which depend on /, an approximation of / is usually used. This 



is the paradigm of computer experiments ( Santner et al. , 2003 , Fang et al. , 2005 ) where 
the unknown function / is deterministic. An approximation of / can be obtained using 



a kernel interpolation method (Schaback, 1995, 2007). For the corresponding covariance 



structure, the Best Linear Unbiased Predictor (BLUP) given by Kriging metamodeling 
(Matheron, 1963) provides the same approximation of /. Due to its flexibility and its 



adaptivity to non-linear functions, Kriging is one of the most used approximation method 
by the computer experiments' community. In our tests, we have observed that Kriging 
works well for approximating a large class of non linear functions when the input space is 



of dimension less than ten. For more details on Kriging, one can see for instance: Sacks 



et al.|(|1989b|a[ ) ;|Cressie|(|1993[);|Laslett| ( |1994[ ); [Stei^ ( |1999[[2002l ) and Sudjianto| ( [2005| ); 
Josephl (|2006|); Iden Hertog et al.l (|2006p. 



The kernel interpolation methodology needs the choice of a kernel K (kernel satisfying 
some conditions detailed below) and a design X = {xi, . . .xat} where the function / is 
to be evaluated, giving {/(xi), . . . , /(xtv)}- As it is well-known, a space of functions TIk 
is associated to K. If the function / belongs to Hk, we can control the pointwise error 
of SK,x.{f), the interpolator of / on X. In this deterministic paradigm (the function / 
is not random), there are essentially two main kinds of properties that a design can have 



(Koehler and Owen, 1996): 



projection properties such as Latin hypercube designs (McKay et al. , 1979) or its 



generalization Latin hyper-rectangle sampling which allows for non-equal cell prob- 



abilities ( Mease and Bingham , 2006 ) ; 



exploratory properties which are warranted by criteria such as: 
— MINIMAX which means that the design has to minimize 



sup mm y 



MAXIMIN which means that the design has to maximize 

(5x = min llxj — xJI . 



(1) 



(2) 
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Moreover, between two designs Xi and X2 such that Jxi = ^X2) using the 
MAXIMIN criterion, we choose the design for which the number of pairs of points 
with distance equal to is minimal. 

— Mean Squared Error (MSE) based criteria. These criteria are linked to the 
MSE of the BLUP in the context of Kriging metamodeling. Designs can be 
sought to minimize the Integrated Mean Squared Error (IMSE) of the BLUP 
over the domain E or to minimize the maximum of the MSE over E. 

The MINIMAX and maximin criteria have been proposed for Kriging metamodeling by 



Johnson et al. (1990). Sacks et al. (1989a) have detailed the MSE-based criteria. For 



others criteria, one can see ( Bursztyn and Steinberg , 2006 ) . 



For kernels defined by radial basis functions, Schaback ( 1995 ) and Madych and Nelson 



( 1992 ) have shown that the MININAX criterion /ix explicitly intervenes in an upper bound 
on the pointwise error between / and s_ft:,x(/)- The upper bound has the form G(/ix) 
where G is an increasing function M_|_ — )• M_|_. Here, we prove that /ix ^ ^x. and then, that 
a MAXIMIN design also provides an uniform upper bound of the pointwise error. 

MINIMAX and IMSE criteria are costly to evaluate and, typically, the maximin criterion 



is privileged. In the case where E is & hypercubic set, Morris and Mitchell (1995) provided 



an algorithm based on simulated annealing to obtain a design very close to a maximin 
Latin hypercube designs, (the criterion optimized is not exactly the maximin one). For the 



two-dimensional case, van Dam et al. (2007) derived explicit constructions for maximin 
Latin hypercube designs when the distance measure is L^o or Li. For the L2 distance 
measure, using a branch-and-bound algorithm, they obtained MAXIMIN Latin hypercube 
designs for < 70. 

For some non hypercubic domains, the use of projection properties can lead to poor 
exploratory designs. For instance, if E = { (x^, j;^) S [0, 1]^ : > x^} then, the only Latin 
hypercube design is on the line = x^ . Moreover, in some cases, they are impossible to 
satisfy. Therefore, we focus on exploratory properties only. 

In the case of an explicit constrained subset of [0, 1]*^ 



Stinstra et al. 



( 2003 ) proposed an 



algorithm based on the use of NLP solvers. Here, we propose some algorithms to achieve a 
MAXIMIN design for general (even not explicit) non hypercubic domains. Our schemes are 
based on simulated annealing. Our proposals are not heuristic, we study the convergence 
properties of the schemes proposed. 

Recall that the simulated annealing algorithm aims at finding a global extremum of a 
function by using a Markovian kernel which is the composition of an exploratory kernel 
and an acceptance step depending on a temperature which decreases during the iterations. 



In some presentations (e.g. Bartoli and Del Moral (2001)), the simulated annealing algo- 



rithm is based on a Metropolis-Hastings algorithm (Chib and Greenberg 1995). In that 



case, it can be proved that the resulting Markov chain tends to concentrate on a global 
extremum of the function to be optimized with high probability when the number of it- 
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erations tends to infinity. Here, we introduce a simulated annealing scheme based on a 
Metropolis-within-Gibbs algorithm (Roberts and Rosenthal, 2006) and prove its conver- 
gence. 

The paper is organized as follows, in Section 2 the kernel interpolation method is 
described and a theoretical justification of the minim AX and maximin criteria is provided 
thanks to the pointwise error bound between the interpolator and the function /. Then, in 
Section 3 the simulated annealing algorithm is presented. A proof of convergence is given. 
Section 4 deals with the case where E is not explicit and can only be known by an indicator 
function. Two variants of the algorithm are proposed and their theoretical properties are 
stated. In Section 5, the algorithms are tried on some examples and practical issues are 
discussed. Finally, in a last Section, the methodology is applied on a real example for 
which the domain is not an hypercube. 



2 Error bounds with kernel interpolations 

A kernel is a symmetric function K : E x E ^ M where E is the input space which is 
assumed to be bounded. The kernel has to be at least conditionally positive definite to be 
used in kernel interpolation. For the sake of simplicity, kernel interpolation is presented 
for positive definite kernels only. denotes the space of functions from E to M. 

Definition 2.1. A kernel K is positive definite if 

V(Cl,Xi)...(CiV,X7v) GMxE, Y1 OCmi^(x/,X™) >0. 

l<l,m<N 

For any x £ E, let i^x denote the partial function x' £ E K{x, x') G M. The linear 
combinations of functions taken in {K^,x £ E} span a functional pre-Hilbert space J-k 
where 

L M ML 

l=\_ m=l m=l 1=1 

is the scalar product. Aronszajn's theorem states that there exists a unique space Hk 
which is a completion of J-k where the following reproducing property holds 

V/ £nK,^e E, /(x) =< f,K^ >n^ . 

Hk is called a Reproducing Kernel Hilbert Space (RKHS). 

Let us denote by SK,x.{f) the orthogonal projection of / on T-Ik(X.) = spanjKxi , • • • , ^xat} 
(/ is assumed to be in Tix', X = {xi, . . . , xjy} and K are given). 
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Lemma 2.1. SK,ii{f) interpolates f on X. Among the interpolators of f on X, SK,x.if) 
has the smallest norm: Si^_x(/) is the solution of the following problem 



mi%eWK \\9\\nK 
ai^k) = /(xfc), k = l, 



.N 



This interpolator corresponds to the BLUP in the Kriging hterature (Cressie 1993 Stein 



2002). It has a Lagrangian formulation. 



Lemma 2.2. For any x £ E, 



N 



•SX,x(/)(x) = ^'Ui(x)/(Xj 



1=1 



where the functions {ui : E 



anc 



G T-Lk(X.) are such that, \/ 1 < i < N , 

Ui{Xi) = 1 

Ui{xk) = ifk^i ' 



i^[X,x] = K[X,X\U{x) 





/ ni(x) \ 


( i^(xi,x) \ 


where C/(x) = 


: Li^[X,x] = 


: and i^[X, X] is such that 






[ i^(xjv,x) / 


(i^[X,X])i<„-<iv = i^(x„x,-). 





Hence, the pointwise error can be bounded from above, Vx € E 

N N 

\'Hk\ 



|/(x)-5i^,x(/)(x)| = I </,i^x-5^n,(x)i^x, I < ||/||^^||Kx-5^n,(x)KxJ|«^. 



i=l 



i=l 



Let -Px(x) = ll-fTx — '}2d=i ^i(x)-?^Xill'HA'- -^x depends only on the kernel K and on the 
design X. Px corresponds to MSE of the BLUP. When it is integrated on the domain E, 
we obtain the IMSE criterion. As already explained, the IMSE or the the maximum MSE 
can be minimized to determine an exploratory design. However, it depends on the kernel 
and it is costly to compute. 

For some kernels K defined by iir(x, x') = (^(||x — x'||2) where ||x||2 = \jYli=i (^*)^) 
(x E E) and (p : M+ — )• M, Schaback (1995) provides the following upper bound on Px;(x): 



Px(x) < GkM . 

The quantity /ix = supyg^; mini<j<jv ||y — Xj|| is associated to the MINIMAX criterion. Gk 
is an increasing function, obviously depending on the kernel. The smoother the kernel K, 
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the faster Gxih) tends to for h — > 0. For instance, the Gaussian kernel is defined by 
ir(x, x') = e^^'l''~^'l'2 where is a real positive parameter; in that case, Gxih) = Ce~^^^^ 
where C and 6 are constants depending on 6. By choosing a design X with a low /iX; 
one thus ensures a smah pointwise error independently of the chosen (radially symmetric) 
kernel. This justifies using minimax ([T]) optimal designs. The next proposition shows that 
a bound on the pointwise interpolation error is still guaranteed when a maximin optimal 
design is used. 

Proposition 2.1. // X is a maximin design, E is enclosed in the union of the balls of 
center x^ and of radius 5x = ^^^i<i,j<N — 

Proof 

This proposition is proved by contradiction: let X be a maximin design and let us suppose 
that there exists a point xq £ E such that ||xo — Xj|| > Sx. for all Xj G X. 
Let (xjpjXjQ) € X^ be a pair of points such that ||xjQ — Xj-qH = 6x and construct the design 
X' = {xi . . . Xjg_i, xq, XjQ+i . . . xat} where the point Xjg is replaced by the point xq. 
^X' ^ and, in the case 6x' = ^Xi is better than X with respect to the maximin 
criterion because the X' contains less pairs of points for which the distance is equal to 5-x.- 
Thus, there is a contradiction because X is not a maximin design. Hence, for any x £ E, 
there exists a Xj S X such that ||x — Xj|| < (5x. Q 

As a consequence of this proposition, if X is a maximin design, 

|/(x)-Si^,x(/)(x)|<||/||«,,GA'(<5x). 

This result justifies theoretically the use of maximin designs when a kernel interpolation 
is used as an approximation of /. Besides it proves that the interpolation done thanks to 
a maximin design is consistent since (5xjv tends to for a sequence (XAr)jvgN of maximin 
designs of respectively points. 

3 Computing maximin designs 

In this Section, we propose an algorithm to provide a maximin design with points in 
any set E enclosed in a bounded set. It is based on a simulated annealing method. It 
aims at finding the global minimum of the function U : E^ ^R+, C/(X) = diam(£;) - 5^ 
where diam(£') is the diameter of the set E (diam(i?) = maXx,x'e-B l|x — x'||). It is obvious 
that to minimize U is equivalent to maximize 5 : X i— t- 5x- 

The initialization step consists of simulating uniformly a lot of points in the domain 
E (using sampling rejection) and of calculating the corresponding empirical covariance 
matrix denoted by S. At the end of the initialization step, we randomly keep A^ points, 
denoted by X^ = {xf\...,xP}. An inverse cooling schedule j3 : t ^ j3t (i.e. {fit)t 
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is an increasing positive sequence and lim^^oo A = 00) is chosen in order to ensure the 
convergence of the algorithm. The paramater r is a variance parameter which is ahowed 
to change during the iterations but, at each iteration, r is such that tq > r > r^m- A 
paramater 7 > is fixed to be a very small integer to prevent from numerical problems. 
We propose to iterate the following steps, for f = 1, . . .: 



Algorithm 1. 

1. A pair of points (x , x^*'' ) is drawn in X*^*) according to a multinomial distribution with 
probabilities proportional to l/(||xj — Xj|| + 7) ; 

2. One of the two points is chosen with probability \, it is denoted by-x^^ ; 

3. A constraint Gaussian random walk is used to propose a new point : 

xf<^~AAf(xW,rS) 

^pr°P is constrained to belong to E. The proposed design is denoted by XP^°p = 

4. X(*+i) = XP'-'^P with probability 

(/ \ n CVP^op 'V'(t)\\ 

l.exp(-A,t/(X-)-L.(X<'),))3^^||j^j, 

ot/7e/w;se X(*+i) =X(*). 



The idea behind this proposal is to force the pairs of points which are very close to be 
more distant. 

In order to explicit the proposal kernel <5r(X, dY) where dY is an infinitesimal neigh- 
borhood of the state Y, let us introduce some notations: 

• dfj = l/{\\xi-yij\\+j), 

• ~ l^k,i.k<l ^kp 

• (f){.\^jL,S) denotes the Gaussian pdf with mean /x and covariance matrix S, 

• G^^s = 't^iy\l^i S)dy denotes the normalization constant associated to S) on 
the domain E. 
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Let A denote the Lebesgue measure on the compact set E (A(dx) = l£;(x)Leb(dx) where 
Leb is the Lebesgue measure on M"^) and let, for any ^, Dir^(-) denote the Dirac measure 
with mass at ^. For a given X = {xi,...,X7v} G , let X_j be such that X_j = 
{xi, . . . , Xj_i, Xj+i, . . . , xat}. The proposal kernel reads as, for X G E'^, Y G 



t>d\N 



N 



Qr{^,dY) = ^(7,,,(X, Y)A(dyi)Dirx_,(dY_0 , 
where for i = 1, . . . , N , 



i=l 



Q.,(X,Y) = 0(y,|x„rS)G,^;,s ^ 



1^^ 

2D^ 



Let us now describe the global kernel associated to Algorithm [TJ It obviously depends 
on the parameters /3 and r. It reads as. 



N 



Kf)^{^,dY) = <j^a^,,,,(X,Y)g,,i(X,Y)A(dyi)Dirx_,((iY_,)| 

/ Va/3,,,i(X,Z)g,,i(X,Z)A(dz,)Dirx_,((iZ_,)^ Dirx(dY) , 



. i=l 



+ 1 



where o« fX Yl - min 1 ^^p'.-p'.'^ v ^^-"'r.^i f • _ i at 

For /? and r fixed, the algorithm we propose is a random scan Metropolis- within-Gibbs 



algorithm (Roberts and Rosenthal, 2006) for which the selection probabilities depend on 



the current state of the Markov chain. Let Xn denote the Lebesgue measure on the 
compact set E^ {X]\[{dX.) = l£;jv(X)LebAr((iX) where LebA? is the Lebesgue measure on 
(M'^)'^). The target distribution that corresponds to our random scan Metropolis-within- 
Gibbs is the Gibbs measure defined by 

fi^idX) = exp(-/3C/(X))Z^iA^(dX) , 



Bartoli and Del Moral 



where = f e ^^^^^Xi\i{dY). In the presentation given by 
the simulated annealing algorithm is based on a Metropolis-Hastings algorithm ( Hastings 



(2001), 



1970) and it is assumed that there exists a measure for which the proposal kernel is 



reversible. In that case, if the target distribution is defined according this measure, the 
ratio of proposal densities does not appear in the acceptance rate. In our case, there does 
not exist a reversible measure for Q^, that is why the ratio of the proposal densities is 
needed. 

We will show the convergence of Algorithm [T] following the proof given in Bartoli and 
Del Moral (2001). Some Markov chains convergence results are used. Indeed, at a fixed 
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temperature, the Markov chain tends to a stationnary distribution which is the Gibbs 
measure. As the temperature decreases, the Gibbs measure concentrates on the global 
extremum of the function. There are typically two properties that are required in the 
proof: the irreducibility and the invariance of the transition kernel with respect to the 
Gibbs measure. As already explained, for /3 and r fixed, this scheme is a random scan 
Metropolis-within-Gibbs algorithm with as target. Indeed, using the detailed balance 
condition, we can easily verified that is iT^^-invariant. The irreducibility of this scheme 
comes from the following result. 

Proposition 3.1. For all non- decreasing sequence < Pi < ■ ■ ■ < Pn and for a sequence 
{Ti)i<i<N such To > Tj > Tmin, wc havc V(X, ^) G X M{E^) 

{Kfl^^ ■ ■ ■ Kfl,^){^, A) > a^ie-^/'--^(^)A^(^) 

where aAi > and osc{U) is the smallest positive number h such that for all X, Y in 
E^, U{Y) - U(X) < h. 

Proof 

Let us prove first that, for all X G E'^ and i = 1,...,N, gT-,i(X, .) > qmin > and 
gT-_j(X, .) < qmax, Qt(X, .)-almost everywhere on E^ . 
The fact that g^(X, .) < 

Qmax is true since the normalization constants are lower-bounded, 
the Gaussian densities are uniformly bounded since tq > t > Tmin > and all the other 
terms can be upper bounded by 1. 

The other assertion is only true Qt(X, .)-almost everywhere on E^ . It means that the 
lower bound on g^i(X, Y) holds when X and Y are such that X_j = Y_j and are both 

in E^. 

The following lower bounds are used: 
• G-Ve > 1> 
V > 



2D^ - iv(diam(E)+7) ' 
• </'(y|x,rS) > ^^-py^^^y^exp(-ir-^„diam(£;)2^) where ^ is the largest eigen- 
value of S~^. 

Qmin > is found by multiplying these expressions and, for any i = 1, . . . , iV, it is a lower 
bound of qT,iO^, Y) which does not depend on r and on the states if X G E^ and Y G -E^ 
have at least — 1 points in common. 

By definition of osc{U), for all X G E^ and for all (5t(X, .)-almost everywhere Y G E^ , 

a^,,,,(X,Y)>e-'^°-(^)^. 

Qmax 
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Thus for (X, A) eE^ X M{E^), K^^^{'X,A) > e-^"'"^^^ ^Qr{^, A). 

Then, for all non- decreasing sequence < /3i < . . . < /J^v and for a sequence (Tj)i<j<^ 

such To > Ti > Train, we have V(X, A)eE^ X M{E^) 

/ \ ^ 

\ Qmax / 

/ \ 

-Ar/3]vosc(!7) / Imin 



\ Qmax / 
/ \ ^ 



□ 

[/ : E'^ — )• M+ is lower bounded with respect to Aat (the Lebesgue measure on the com- 
pact set E'^). We use the following notation m = supa[a; AAr({X; C/(X) < a}) = 0], by 
definition Xn{{^; U{X.) < m}) = 0. 

Moreover, for all e > 0, we define = {X G i?^; U(X.) < m + e} which is clearly such 
that Xn{Ux J>0 and U'^^^ = {X; c/(X) > m + e}. 



As stated in BartoH and Del Moral (2001) 



Ve>0, hm /.^([/^ ) = 1, 



Following the proof of Theorem 4.3.16 in Bartoli and Del Moral (2001) and using Propo- 



sition 3.1, we obtain the convergence of this algorithm. 



Theorem 3.1. If the sequence (t„)„>o is such that Vn > 0, tq > t„ > T^m > and if 

/3n = ^ log(n + e) , C> NosciU) , 

we get 

Ve > 0, lim P^(X(") eUl^) = l 

where {X^");n > 0} denotes the random sequence we get from Algorithm^ with an initial 
probability distribution r] on E^ . 

Unfortunately, the function U is not regular enough to estimate the convergence speed. 

4 Variants of the algorithm 

In the case where E is not explicit, the normalization constant Gm,s of a Gaussian dis- 
tribution with mean m and covariance matrix S cannot be computed. Hence, the ratio 
of densities of proposal kernels is not tractable. In that case, we first propose to use as a 
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proposal an unconstrained Gaussian random walk. The steps [3] and |4] of Algorithm [T] are 
modified. 



Algorithm 2. The first steps until step\^ are the same. 
Step is replaced with 

3bis. A Gaussian random walk is used to propose a new point : 

xr^~AA,(xW,rS). 

And step is replaced with 

4bis. lfyj>^°P G E^, X(*+i) = XP™P with probability 

min (l.exp (-A(f/(X-) - f/(X<'))))) gjl^^g!^) 
Otherwise X(*+i) =XW. 



The proposal kernel corresponding to this algorithm where the Gaussian random walk 
is not constraint to remain in the domain E reads as: for any X G E^ , Y G (M'^)''^, 

AT 

Q,(X,dY) = J^g,,i(X, Y)A(dyi)Dirx_,(dY_0 , 

i=l 

where for i = 1, . . . , A^, 

g.,,(X, Y) = 0(y.|x„rS)) I 

As for Algorithm [l| if (r„)„>o is such that Vn > 0, tq > t„ > > and if /3„ = 

^ log(n + e) with C > Nosc{U), the random sequence we get from Algorithm 2 gives 
Ve > 0, lim„^ooIPr,(X(") € U{ ) = 1. However, since a point can be proposed outside of 
the domain E, this algorithm can suffer from a lack of efficiency. Another solution is to 
use the first algorithm without the ratio of densities of proposal kernels. 
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Algorithm 3. The first steps until step^ are the same than in Algorithm^ 
Step is replaced with 

4ter. X(*+i) = XP'^°P with probability 

min (l,exp (-^([/(X?'™^') - C/(xW))))) , 
otherwise X(*+i) =XW. 



The global kernel associated to Algorithm [3] is 

(X, clY) = 6^,,(X, Y)Q,(X, dY) j^^ 6^^,(X, Z)Q,(X, dZ)^ 6^{dY) , 

where f)/3^T-(X, Y) = min (l, cxp(-fl[/(X)) ) • '^^^ measure is not ii'^^-invariant. Hence, 



, exp(-/3l/(X)) ; • ^^^^ iiicaouic ^x/^ id iiul ^v^ , 

we cannot use the Markov chain convergence theory to obtain a result similar to Theorem 
However, /U^ is /C^i^-irreducible and we can easily state the following proposition. 



3.1 



Proposition 4.1. For all non- decreasing sequence < /3i < . . . < /^tv (md for a sequence 
iTi)i<i<N such tq > Ti > Tmin, wc have V(X, A) G X ^{E^) 

{Kfl,^ . . . Kt%,)(X, A) > aAze-^'P-^^^^^XMiA) 

where a^s > 0. 



As shown in Locatelli (1996), this proposition leads to the fact that a design reaching 
a neighborhood of a global maximum of 5x can be achieved in a finite number of 
iterations almost surely using Algorithm |3j 

Proposition 4.2. For any e > 0, if, Vn G N, /3„ < ^ log(n + e) with C > Nosc{U) the 
expected time until the first visit in U^^ is finite. 

Proof 

The expected time until the first visit in U^^ is equal to 

oo 

^A:P(X«,...,XW ^ C/^^|XW ^ t/^^) XP(X('^+1) G C/^^ |xW, . . . , X^^) ^ t/^^). 
fc=i 

The aim is to find an upper bound in order to show that it is finite. The second probability 
in the argument of the series is limited from above with one. The first probability in the 
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argument of the series is the probabihty of never visiting U^^ in the first k steps. 
It can also be written as: 



Thanks to Proposition 4.1 it holds that 

P(at least one visit in U^^ in the first A^steps) > P(x(^) ^ U^J > OAsXNiUlJ exp(-/3AriVosc([/)) . 
Thus, 

P(X«, . . . , XW ^ C/^^|X(0) (^UI^)<1- aA3XN{UlJ eM-I^NNoscm . 
And in a similar way, 

P(X(^^+1),...,X(('^+1)^) ^ C/^^|XW,...,X(^^) ^ t/^^) < l-a^3Aiv(^I,)exp(-/3;v(m)^osc(C/)). 
Hence, the expected time before the first visit in U^^ can be bounded from above by 

oo [k/N\ 

H {l-aA3XN{Ul^)eM-f3Nir+i)Nosc{U))) . 

k=l i=l 

As log(l — 2x) < —X if < X < 1/2, the previous sum is bounded by 

[k/N\ 



2^A;exp - 2^ ^ ^exp(-/3^(i+i)A^osc(?7)) . 

fc=l \ i=l I 

If (3k is chosen such that Pn = ^ log(n + e), (C > Nosc{U)), the sum becomes 



2^Kexp - 2^ 



.^1 V - -1 ^^^^'^"^ 

which can be bounded above by 

g ^ 1^ 2 L^/'^J I (fcTiV) J ) ' 

which is a convergent series. □ 

Since the best design ever found during the iterations is saved, the previous proposition 
provides a theoretical guarantee for Algorithm [3] Moreover, as a direct consequence, 
the Markov chain defined by Y^") = X^*") with t„ = arg mini<(<„ [/(X^*)) is such that 
lim„_>oo lPr?(Y(") E ^Ajy) = 1- In practice, n is finite and the chosen design is Y^") and not 
X("). Also, we can consider that the previous convergence result is sufficient. However, 
this kind of result can be obtained with any algorithm producing a Markov chain which 
well visits the space of states even if the temperature is fixed! Algorithm [3] directly derives 
from Algorithm [T] and we can expect that they have similar behaviors. 
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5 Numerical illustrations 



First, the three algorithms are tested on three different toy cases: a design with 100 points 
in [0,1]^, a design with 250 points in [0,1]^ and a design with 400 points in [0,1]^. In 
these hypercubic cases, the normahzation constants can be computed and Algorithm [T] 
can be used. In each case, 100 calls are made to one million iterations of each algorithm. 
We observed that the chains produced by the algorithms remain quite stationnary after 
one million iterations. The inverse cooling schedule is /3„ = (I/Tq) log(n) and the variance 
schedule is Tn = tq/ y/n. 

In order to choose Tq, a lot of designs with N points can be drawn uniformly in E. 
Then, a median of 5x; the minimum distance between pair of points in these designs, 
is computed. Thus, it is a mean to access to an order of magnitude of 6x. when X is 
uniformly distributed. A fraction of this value is a good choice for Tq according to our 
tries. Note that it is much lower than the one required in the convergence theorem. 

For To, we suggest to use tq = Vol(£;)/7V^/'^ where Yol{E) is the volume of E or an 
upper bound of this volume. Clearly, tq which parametrizes the random walk variance 
should not exceed \ol{E) and the previous formula is derived from analogy with a grid. 
For a d-dimensional space, the number of points in a grid reads as N = k"^ where k is an 
integer which corresponds to the number of projected points on each axis. Thus, it seems 
reasonable to divide the volume of the domain by k or more generally by N^^^ to ensure 
a good exploration of the space. 

Figures [l] [2] and [3] present the results. For each algorithm, the boxplots of the best 
solutions to the maximization of 6x. over one million iterations (boxplots are constructed 
using 100 replicates) are given. Algorithms [l] and [3] give the best results. Algorithm [2] 
suffers from the fact that the proposal can be outside of the domain. The computation 
of U is the most time consuming step of these algorithms, that is why the comparison is 
based on the number of iterations. 

Other cooling schedules than the ones which have theoretical guarantees can be tried. 
It seems that they can lead to satisfying results which are even better than the ones 
obtained with the log schedule. Since the results depend too much on the examples, it 
is quite hard to state a general rule. However, a schedule /?„ = l/TQ^/n is robust to a 
bad choice in Tq and a schedule r„ = To/y/n performs quite well. The variance decreasing 
schedule is set to freeze for a given n, thus it satisfies the boundedness assumption of the 
theoretical results. 

Finally, in the domain Et = {{xi,X2) G [0, 1]^ : xi > X2}, four strategies are compared 
to obtain a design with 100 points: taking a design whose points are realizations of a 
uniform distribution on Et, taking a lhs-maximin design in [0, 1]2 with 200 points thanks 
to the algorithm of Morris and Mitchell ( 1995 ) and keeping the subset which is in Et only 
(it gives what we call a truncated LHS-MAXiMiN design), using a Sobol' sequence of 100 
points constrained to be in Et, making use of Algorithm[3j Table [T] displays some statistics 
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on values of 6x. for 100 replicates. Only the mean is given for the Sobol' sequence since 
it is a deterministic strategy. The truncated lhs-maximin strategy provides designs with 
approximately 100 points (between 93 and 107 on the 100 replicates). 





Mean Variance Min Max 


Uniform 


0.0048 8.2-10-*^ 4.0-10-^ 0.013 


Truncated lhs-maximin 


0.034 8.2 • 10"*^ 0.025 0.039 


Sobol' sequence 


0.011 N/A N/A N/A 


Algorithm 3 


0.080 7.8 • 10-8 0.079 0.081 



Table 1: Comparison of designs in Et based on the values of 5x for 100 replicates 



6 Application to a simulator of an aircraft engine 

The behavior of an aircraft engine is described by a numerical code. A run of the code de- 
termines if the given flight conditions are acceptable and, provided they are, computes the 
corresponding outputs. The function which associates the outputs to the flight conditions 
is denoted by /. It is accessible only through runs of the code. It is a black box function 
and a run is time-consuming. A thousand calls to the code are run in ten minutes. The 
goal is to incorporate the modelization of the engine in a global model of an aircraft for 
a preliminary design study. Since the simulator of the engine is too burdensome, we are 
asked to compute an approximation of / which can be included in the global model. 

The acceptable flight conditions represent the domain of definition of /, denoted by E. 
Outside E, the code cannot provide outputs since the conditions are physically impossible 
or the code encounters convergence failures. E is not explicit, as explained above we 
have to run the code to know if the flight conditions are acceptable. Therefore, we need 
to estimate E (the indicator function associated to E). This is not our goal here. E is 
included in a known hypercube (lower and upper bounds are available on each of these 
variables). Using other prior information and some calls to /, a binary classification tree 



has been built to determine an estimate of the indicator function of E (Auffray et al 



2011). This method works quite well and leads to a misclassification error rate around 
0.5%. The resulting domain is not an hypercube. 

In the following case study, only the flow rate output is focused on. The flight con- 
ditions are described by ten variables such as altitude, speed, temperature, humidity... 
A variable selection procedure has shown that only d = 8 input variables are useful for 
prediction (jAuffray et al. 2011). Hence, the considered function to be approximated is 
f : E C R'^ -)■ R. 

A MAXIMIN design is drawn thanks to lO''^ iterations of Algorithm [sj The initial 
temperature Tq and the initial variance tq were chosen as described in the previous section. 
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The inverse cooling shedule was /3„ = (I/Tq 
during the first quarter of iterations and then r, 



n and the variance schedule was constant 
To/v^ 



'n- 107/4. 

Approximations of the function / are made by kernel interpolations on four different 
designs: the maximin design that was computed, a design whose points follow a uni- 
form distribution on E, a design obtained by truncating a Latin hypercube design of 
5, 000 points defined on the hypercube domain containing E and a design given by a low- 
discrepancy sequence (Sobol) constrained to be in E (see Bratley and Fox, 1988). The 



LHS is truncated by keeping only the points which belongs to E. The kernel interpolations 
are computed by the Matlab toolbox DACE (Lophaven and Sondergaard, 2002). The 



regression functions are chosen as the polynomials with degree smaller than or equal to 
two and the kernel is a generalized exponential kernel: 



K{x, x') = exp 



X 



U) 



where = 1, . . . , d are respectively the j*^ coordinates of x, x' and 9i, . . . ,9d,v 

are parameters which are estimated using the usual maximum likelihood estimators. Oth- 
ers methods such as cross validation could be used to choose these parameters. The results 
given in Section 2 only applies when the kernel is isotropic and Gaussian which means 
6»i = . . . = 6*^ = 6* and = 2. 

The four designs are sets of approximately 1 , 300 points which are included in the 
domain E according to the estimated indicator function. For the LHS, we need around 
5,000 points to get approximately 1,300 points in E. 

The function / is computed at the points of the designs. Some points have to be 
removed from the designs since the code indicates that they are not in E (recall that the 
designs were built thanks to an estimate of E) . 

Table [2] provides the performances of kernel interpolations according to the designs. 
The performances are evaluated on another set of T = 1, 300 points uniformly distributed 
in E (obtained using sampling rejection) on which the function / is also computed. If / 
denotes a kernel interpolator and {zi, . . . , z-p} is the set of test points, those quantities 
are reported: 



the Mean Relative Error (MRE), 



T 

E 



/(Zj) - /(Zi 



the Maximum Relative Error (MaxRE), 

/(zO - /(z,) 



max 
i=i,...,r 
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• the Mean Squared Error (MSE), 

T 

i^(/(z,)-/(z,))' • 

i=l 

Table [2] also contains the number of points which are actually in E and the minimal 
distance (5x between the pairs of points of the designs. To compute these distances, the 
designs were translated into the hypercube [0, 1]®. 





mRE 


MaxRE 


MSE 


Nb of Points 




Uniform 


0.49% 


5.2% 


0.63 


1284 


0.15 


LHS 


0.48% 


6.9% 


0.73 


1275 


0.14 


MAXIMIN 


0.47% 


3.5% 


0.56 


1249 


0.33 


Sobol' sequence 


0.46% 


7.7% 


0.62 


1277 


0.15 



Table 2: Comparison of performances of kernel interpolation on the different designs 

The MAXIMIN design makes the kernel interpolation more efficient especially according 
to the MaxRE criterion (although it contains less admissible points than other designs). 
As it was shown, the kernel interpolation accuracy depends sharply on the spreading out 
of the points of the design. Thus, the maximin design which ensures that any point of E 
is not far from the points of the design leads to the best performances. 
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